Search CORE

10 research outputs found

Students taught by multimodal teachers are superior action recognizers

Author: Blaschko Matthew
Grujicic Dusan
Moens Marie-Francine
Radevski Gorjan
Tuytelaars Tinne
Publication venue
Publication date: 09/10/2022
Field of study

The focal point of egocentric video understanding is modelling hand-object interactions. Standard models -- CNNs, Vision Transformers, etc. -- which receive RGB frames as input perform well, however, their performance improves further by employing additional modalities such as object detections, optical flow, audio, etc. as input. The added complexity of the required modality-specific modules, on the other hand, makes these models impractical for deployment. The goal of this work is to retain the performance of such multimodal approaches, while using only the RGB images as input at inference time. Our approach is based on multimodal knowledge distillation, featuring a multimodal teacher (in the current experiments trained only using object detections, optical flow and RGB frames) and a unimodal student (using only RGB frames as input). We present preliminary results which demonstrate that the resulting model -- distilled from a multimodal teacher -- significantly outperforms the baseline RGB model (trained without knowledge distillation), as well as an omnivorous version of itself (trained on all modalities jointly), in both standard and compositional action recognition.Comment: Extended abstract accepted at the 2nd Ego4D Workshop @ ECCV 202

arXiv.org e-Print Archive

Linking Surface Facts to Large-Scale Knowledge Graphs

Author: Gashteovski Kiril
Glavaš Goran
Hung Chia-Chien
Lawrence Carolin
Radevski Gorjan
Publication venue
Publication date: 23/10/2023
Field of study

Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking

arXiv.org e-Print Archive

Self-supervised context-aware Covid-19 document exploration through atlas grounding

Author: Blaschko Matthew
Grujicic Dusan
Radevski gorjan
Tuytelaars Tinne
Publication venue
Publication date: 09/07/2020
Field of study

status: publishe

Lirias

Decoding language spatial relations to 2D spatial arrangements

Author: Collell Guillem
Moens marie
Radevski Gorjan
Tuytelaars Tinne
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 19/11/2020
Field of study

status: publishe

Lirias

Learning to ground medical text in a 3D human atlas

Author: Blaschko Matthew
Grujicic Dusan
Radevski Gorjan
Tuytelaars Tinne
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 21/09/2020
Field of study

status: accepte

Lirias

Multimodal Distillation for Egocentric Action Recognition

Author: Blaschko Matthew
Grujicic Dusan
Moens Marie-Francine
Radevski Gorjan
Tuytelaars Tinne
Publication venue
Publication date: 18/07/2023
Field of study

The focal point of egocentric video understanding is modelling hand-object interactions. Standard models, e.g. CNNs or Vision Transformers, which receive RGB frames as input perform well. However, their performance improves further by employing additional input modalities that provide complementary cues, such as object detections, optical flow, audio, etc. The added complexity of the modality-specific modules, on the other hand, makes these models impractical for deployment. The goal of this work is to retain the performance of such a multimodal approach, while using only the RGB frames as input at inference time. We demonstrate that for egocentric action recognition on the Epic-Kitchens and the Something-Something datasets, students which are taught by multimodal teachers tend to be more accurate and better calibrated than architecturally equivalent models trained on ground truth labels in a unimodal or multimodal fashion. We further adopt a principled multimodal knowledge distillation framework, allowing us to deal with issues which occur when applying multimodal knowledge distillation in a naive manner. Lastly, we demonstrate the achieved reduction in computational complexity, and show that our approach maintains higher performance with the reduction of the number of input views. We release our code at https://github.com/gorjanradevski/multimodal-distillation.Comment: Accepted at ICCV 2023; Codebase released at https://github.com/gorjanradevski/multimodal-distillatio

arXiv.org e-Print Archive

Cohort-derived machine learning models for individual prediction of chronic kidney disease in people living with HIV: a prospective multicentre cohort study.

Author: Battegay Manuel
Bernasconi Enos
Bogojeska Jasmina
Calmy Alexandra
Cavassini Matthias
Fux Christoph A
Günthard Huldrych F
Kahlert Christian R
Kouyos Roger D
Marzolini Catia
Radevski Gorjan
Rauch Andri
Roth Jan A
Scherrer Alexandra U
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/05/2020
Field of study

BACKGROUND It is unclear whether data-driven machine learning models, which are trained on large epidemiological cohorts, may improve prediction of co-morbidities in people living with HIV. METHODS In this proof-of-concept study, we included people living with HIV of the prospective Swiss HIV Cohort Study with a first estimated glomerular filtration rate (eGFR) >60 ml/min/1.73 m2 after January 1, 2002. Our primary outcome was chronic kidney disease (CKD) ─ defined as confirmed decrease in eGFR ≤60 ml/min/1.73 m2 over three months apart. We split the cohort data into a training set (80%), validation set (10%), and test set (10%) ─ stratified for CKD status and follow-up length. RESULTS Of 12,761 eligible individuals (median baseline eGFR, 103 ml/min/1.73 m2), 1,192 (9%) developed a CKD after a median of eight years. We used 64 static and 502 time-changing variables: Across prediction horizons and algorithms and in contrast to expert-based standard models, most machine learning models achieved state-of-the-art predictive performances with areas under the receiver operating characteristic curve and precision recall curve ranging from 0.926 to 0.996 and from 0.631 to 0.956, respectively. CONCLUSIONS In people living with HIV, we observed state-of-the-art performances in forecasting individual CKD onsets with different machine learning algorithms

edoc

PubMed Central

ZORA

Bern Open Repository and Information System (BORIS)

A multimodal AI approach for intuitively instructable autonomous systems : a case study of an autonomous off-highway vehicle

Author: Chavali Anil Kumar
De Schepper Tom
Hutsebaut-Buysse Matthias
Latr\ue9 Steven
Mets Kevin
Radevski Gorjan
Temsamani Abdellatif Bey
Tuytelaars Tinne
Van Hamme Hugo
Vervoort Ward
Publication venue
Publication date: 01/01/2022
Field of study

Institutional Repository Universiteit Antwerpen

A multi-modal AI approach for AGVs : a case study on warehouse automated inventory

Author: De Schepper Tom
Gebell\ued Guinjoan Ferran
Hutsebaut-Buysse Matthias
Latr\ue9 Steven
Mannens Erik
Mets Kevin
Rademakers Erwin
Radevski Gorjan
Temsamani Abdellatif Bey
Tuytelaars Tinne
Van Hamme Hugo
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Repository Universiteit Antwerpen

A multi-modal AI approach for intuitively instructable autonomous systems

Author: De Schepper Tom
Gebell\ued Guinjoan Ferran
Hutsebaut-Buysse Matthias
Kumar Chavali Anil
Latr\ue9 Steven
Mannens Erik
Mets Kevin
Rademakers Erwin
Radevski Gorjan
Temsamani Abdellatif Bey
Tuytelaars Tinne
Van Hamme Hugo
Publication venue
Publication date: 24/05/2024
Field of study

Abstract: We present a multi-modal AI framework to intuitively instruct and control Automated Guided Vehicles. We define a general multi-modal AI architecture, which has a loose coupling between three different AI modules, including spoken language understanding, visual perception and Reinforcement Learning navigation. We use the same multi-modal architecture for two different use cases implemented in two different platforms: an off-road vehicle, which can pick objects, and an indoor forklift that performs automated warehouse inventory. We show how the proposed architecture can be used for a wide range of tasks and can be implemented in different hardware, demonstrating a high degree of modularity

Institutional Repository Universiteit Antwerpen